Tensor Multiplication on Parallel Computers

نویسنده

  • Bryan Rasmussen
چکیده

One disadvantage of our approach is that it requires each processor to work on a large piece of the problem for a long time, thus increasing the probability that a single processor failure will sabotage the computation. Another limitation is that we must increase the number of processors in potentially large step-increments in order to take advantage of larger clusters. (The ideal number of processors is an integer multiple or divisor of the number of rows in u.)

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Parallel Matrix Multiplication Method Adapted on Fibonacci Hypercube Structure

The objective of this study was to develop a new optimal parallel algorithm for matrix multiplication which could run on a Fibonacci Hypercube structure. Most of the popular algorithms for parallel matrix multiplication can not run on Fibonacci Hypercube structure, therefore giving a method that can be run on all structures especially Fibonacci Hypercube structure is necessary for parallel matr...

متن کامل

Experimental Evaluation of BSP Programming Libraries

The model of bulk-synchronous parallel computation (BSP) helps to implement portable general purpose algorithms while keeping predictable performance on different parallel computers. Nevertheless, when programming in ‘BSP style’, the running time of the implementation of an algorithm can be very dependent on the underlying communications library. In this study, an overview of existing approache...

متن کامل

Parallel Implementation of Multiple-Precision Arithmetic and 1, 649, 267, 440, 000 Decimal Digits of π Calculation

We present efficient parallel algorithms for multiple-precision arithmetic operations of more than several million decimal digits on distributed-memory parallel computers. A parallel implementation of floating-point real FFT-based multiplication is used because a key operation in fast multiple-precision arithmetic is multiplication. We also parallelized an operation of releasing propagated carr...

متن کامل

Generalized Hyper-Systolic Algorithm

We generalize the hyper-systolic algorithm proposed in [1] for abstract data structures on massive parallel computers with np processors. For a problem of size V the communication complexity of the hyper-systolic algorithm is proportional to √ npV , to be compared with npV for the systolic case. The implementation technique is explained in detail and the example of the parallel matrix-matrix mu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007